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Abstract: A general model is proposed to explain the relation between the 
extrasolar planets (or exoplanets) detected until June 2008 and the main char- 
acteristics of their host stars through statistical techniques. The main goal is to 
establish a mathematical relation among the set of variables which better de- 
scribe the physical characteristics of the host star and the planet itself. The host 
star is characterized by its distance, age, effective temperature, mass, metallicity, 
radius and magnitude. The exoplanet is described through its physical parame- 
ters (radius and mass) and its orbital parameters (distance, period, eccentricity, 
inclination and major semiaxis). As a first approach we consider that only the 
mass of the exoplanet is being determined by the physical properties of its host 
star. The proposed model is then validated through statistical analysis. 
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1 Introduction 

An extrasolar planet (or exoplanet) is a planet which orbits a star other 
than the Sun, and therefore belongs to a planetary system other than our 
Solar System. The first extrasolar planet around a main sequence star was 
discovered in 1995 (Mayor and Queloz, 1995). Actually more than 300 exo- 
planets have been documented and most of them with masses greater than 
Jupiter's mass (Schneider, 2009). Detecting an exoplanet is a very difficult 
task because they do not emit any electromagnetic radiation of their own 
and are completely obscured by their extremely bright host stars, that is, 
normal telescope observation techniques cannot be used. Thus, in order to 
find exoplanets, a variety of techniques like the radial velocity, pulsar tim- 
ing, astrometry, gravitational lensing, spectrometry and photometry (De 
Pater and Lissauer, 2001) are used. The main purpose of any method is 
to detect the effect produced by the exoplanet on its stellar system. Be- 
sides the discoveries it is important to search for models that can explain 
the origin, formation and possible migration of these bodies. For example, 
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Rice and Armitagc (2005) have investigated how the statistical distribu- 
tion of extrasolar planets may be combined with knowledge of the host 
stars' metallicity to yield constraints on the migration histories of gas gi- 
ant planets. Moreover in a series of papers (Udry et al., 2003; Santos et 
al., 2003; Eggenberger et al., 2004; Halbwachs et al., 2005) the emerging 
properties of planet-host stars and characteristics of the different orbital- 
element distributions of exoplanetary systems have been studied. In this 
work we analyze the cross-sectional data for the exoplanets detected until 
June 2008 through linear regression techniques. The purpose of this kind 
of analysis is to verify the relation between the host star and its orbiting 
planet. For example, if the planet's mass is strongly determined by the type 
of star and hence affects the planetary formation stage. 

1.1 Characteristics of the data catalog: Stars and Planets 

The catalog was created in February 1995 to facilitate the progress of the 
new field named Exoplanctology through the publication of recent detec- 
tions and their associated data. The catalog is interactive and it is available 
in the webpage: http://exoplanct.cu. 

Until June 2008 the catalog contains: 303 exoplanets and 259 planetary 
systems (31 multiple systems). Two important considerations are: 1) the 
mass of the exoplanet is -at least- 13 Mj (Jupiter's mass) and 2) the data 
source must be reliable, that is, previously published in referred journals, 
presented in conferences, among others. 

• Stars: The stellar data are taken from well-known databases like 
Simbad or directly from published papers. The basic physical char- 
acteristics of a star are: radial velocity, mass, metallicity, age and 
distance. 

• Planets: These data are taken from published papers and from the 
sites: Anglo- Australian Planet Search; California and Carnegie Planet 
Search; Geneva Extrasolar Planet Search Programmes; Transatlantic 
Exoplanet Survey and the Department of Astronomy at University 
of Texas. 

2 The General Model: Multiple Regression Analysis 

We start with the following model (Model A) described by the equation: 

M P = a 1 +a 2 DS+a 3 AS+a 4 TS+a 5 MS+a 6 METAL+a 7 MAG+a 8 RS+u. 

(1) 

where Mp is the exoplanet 's mass and oti are the coefficients for each term. 
Eq. (1) expresses the exoplanet 's mass Mp in terms of the values of the 
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TABLE 1. Estimated values for the parameters. 



Variable 


ai 


Model A 

Standard Error 


t-statistic 


Probability 


C 


-9.1773 


5.0051 


-1.8336 


0.0685 


DS 


-0.0113 


0.0074 


-1.5215 


0.1301 


ES 


0.0288 


0.0872 


0.3299 


0.7419 


TS 


0.0013 


0.0007 


1.9169 


0.0570 


MS 


1.9385 


1.3721 


1.4128 


0.1596 


METAL 


-1.7493 


1.2586 


-1.3899 


0.1664 


MAG 


0.3689 


0.2850 


1.2944 


0.1973 


RS 


0.2335 


0.1183 


1.9747 


0.0499 






Model B 






Variable 


OLi 


Standard Error 


t-statistic 


Probability 


C 


-2.5169 


1.0840 


-2.3218 


0.0213 


ES 


-0.0493 


0.0321 


-1.5345 


0.1265 


TS 


0.0003 


0.0002 


1.7417 


0.0831 


MS 


1.1772 


0.3964 


2.9698 


0.0033 


METAL 


-1.1370 


0.5001 


-2.2738 


0.0241 


SIST*MS 


-0.1809 


0.2188 


-0.8269 


0.4093 


SIST*METAL 


0.5629 


0.8338 


0.6752 


0.5003 



variables representing the features of the host star. This set of variables 
contains: the distance, DS; the age, AS; the temperature, TS; the mass, 
MS; the metallicity, METAL; the magnitude, MAG and the radius, RS. 
Finally m are the random errors. 

We estimate the unknown parameters in Eq. (1) by Ordinary Least Squares 
(OLS). The results are shown in Table 1 where we also include the values of 
the t-statistics and their associated probabilities for the coefficient signifi- 
cance tests. From the estimated values we conclude that the only significant 
variable for the Model A is RS. 

2.1 Verification of the linear regression assumptions (Model A) 

1. Linearity: The model passed all the Ramsey tests for linearity. We 
conclude that the proposed functional form is adequate. 

2. Omitted Variables: According to the star formation theory, the vari- 
ables MS and METAL must be included to explain the relation 
between the mass of the exoplanet and its host star. 

3. Multicollincality: There is a possible weak correlation between MAG 
and DS. 
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4. Heteroskedasticity: From the White test on the residuals we conclude 
that they are not heteroskedastic, that means the residuals are ho- 
moskedastic. 

5. Normality: From the value of the Jarque-Bera statistic we conclude 
that the residuals are not normally distributed. 

6. Homogeneity: Defining the "dummy" variable as SI ST (0 means that 
the cxoplanct belongs to a single planetary system and 1 refers to a 
multiple planetary system) we conclude that the model is homoge- 
neous. 

Statistical model must satisfy all the assumptions mentioned above to be 
correctly specified. In our case, the Model A needs some modifications, for 
example, another functional form and/or the consideration of an adequate 
"dummy" variable. In such a case we derive the Model B: 



log(Mp) = a 1 +a 2 ES+a 3 TS+a 4 MS+a 5 METAL+-f 1 SMS+j 2 SMET+u i 

(2) 

where SMS = SIST * MS and SMET = SI ST * METAL are two new 
variables that take into account the fact that the exoplanet can belong 
to a single or a multiple planetary system. The parameters are estimated 
through OLS and the results are summarized in Table 1. 



2.2 Verification of the linear regression assumptions (Model B) 

1. Linearity: The model passed all the Ramsey tests for linearity. More- 
over we conclude that the new functional form is more adequate than 
the presented in Model A. 

2. Omitted Variables: The tests indicate that the variables ES and TS 
must be excluded. However, under this situation the linearity is not 
preserved and we loose important physical information about the host 
star. 

3. Multicollincality: There is no correlation among the independent vari- 
ables. 

4. Heteroskedasticity: From the White test on the residuals we conclude 
that they are heteroskedastic. 

5. Normality: From the value of the Jarque-Bera statistic we conclude 
that the residuals are normally distributed. 

6. Homogeneity: The model has already included the effect of a dummy 
variable. 
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Model B (log-linear) is slightly better than Model A in the sense that we 
have improved some of the discrepancies previously detected in the basic 
assumptions. However, this latter model cannot be considered yet to explain 
the relation between an exoplanct and its host star. Including the effect of 
a "dummy" variable seems to be a clue for another type of model. This 
binary behavior is discussed in the next section. 

3 Multiple Regression Analysis with Binary 
Dependent Variables: a different approach 

Based on the data, the dependent variable (exoplanet) is simultaneously de- 
termined by several parameters, qualitative and quantitative. In this work 
we have just assumed that the mass, Mp, is the quantitative variable that 
represents the whole physical/orbital characteristics of the planet. However, 
this fact is not completely true and more qualitative information must be 
taken into account for the model. 

In the context of the variable exoplanet, the relevant information can be 
captured by defining a binary variable or a zero-one variable. An example 
of such a variable was introduced in Section 2 as SIST and it is related to 
the fact that the exoplanet can belong to a single or a multiple planetary 
system, in other words, SIST = if the exoplanct belongs to a single 
planetary system and SIST = 1 in other case. 

Under this new approach some binary models can be employed and their 
choice depends on the data distribution. For example, for a normal dis- 
tribution we apply the probit model, for a logistic distribution we apply 
the logit model and when the data are truncated or censored we apply the 
tobit model. 

Once the model is selected, its parameters can be estimated through the tra- 
ditional methods like the Maximum Likelihood (ML) and Ordinary Least 
Squares (OLS). 

A general binary model (Model C) for this case can take the form: 

M P = ai + a 2 ES + a 3 TS + a 4 MS + a 5 METAL + u t (3) 

The special case of Model C under the binary context will be discussed 
elsewhere. Recently a logit model was developed and validated by Fressin 
et al. (2009). In that work the authors performed a logistic regression to 
model the probability that a given planet is "real" (that means, observed 
or detected) or just simulated. 

4 Summary and Conclusions 

From our extensive statistical analysis we conclude that Model B is better 
than Model A. We have improved its specifications through the deletion of 
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variables like MAG, DS and RS and the addition of new ones that consider 
the possibility of finding exoplanets in single or multiple planetary systems. 
At the moment this is our best representation of the relation between the 
exoplanet and its host star and in a future work we will consider the problem 
by including binary variables. 
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